Cross-Modal Prototype Driven Network for Radiology Report Generation
نویسندگان
چکیده
Radiology report generation (RRG) aims to describe automatically a radiology image with human-like language and could potentially support the work of radiologists, reducing burden manual reporting. Previous approaches often adopt an encoder-decoder architecture focus on single-modal feature learning, while few studies explore cross-modal interaction. Here we propose Cross-modal PROtotype driven NETwork (XPRONET) promote pattern learning exploit it improve task generation. This is achieved by three well-designed, fully differentiable complementary modules: shared prototype matrix record prototypes; network learn prototypes embed information into visual textual features; improved multi-label contrastive loss enable enhance learning. XPRONET obtains substantial improvements IU-Xray MIMIC-CXR benchmarks, where its performance exceeds recent state-of-the-art large margin comparable (The code publicly available at https://github.com/Markin-Wang/XProNet .).
منابع مشابه
MHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval
Cross-modal retrieval has drawn wide interest for retrieval across different modalities of data (such as text, image, video, audio and 3D model). However, existing methods based on deep neural network (DNN) often face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is usually adopted for relievin...
متن کاملAttribute-Guided Network for Cross-Modal Zero-Shot Hashing
Zero-Shot Hashing aims at learning a hashing model that is trained only by instances from seen categories but can generate well to those of unseen categories. Typically, it is achieved by utilizing a semantic embedding space to transfer knowledge from seen domain to unseen domain. Existing efforts mainly focus on single-modal retrieval task, especially Image-Based Image Retrieval (IBIR). Howeve...
متن کاملCorrelation Hashing Network for Efficient Cross-Modal Retrieval
Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for large-scale multimedia retrieval. Cross-modal hashing, which improves the quality of hash coding by exploiting the semantic correlation across different modalities, has received increasing attention recently. For most existing cross-modal hashing methods, an object is first r...
متن کاملInformatics in Radiology (infoRAD): radiology report entry with automatic phrase completion driven by language modeling.
Keyboard entry or correction of radiology reports by radiologists and transcriptionists remains necessary in many settings despite advances in computerized speech recognition. A report entry system that implements an automated phrase completion feature based on language modeling was developed and tested. The special text editor uses context to predict the full word or phrase being typed, updati...
متن کاملCross-Modal Manifold Learning for Cross-modal Retrieval
This paper presents a new scalable algorithm for cross-modal similarity preserving retrieval in a learnt manifold space. Unlike existing approaches that compromise between preserving global and local geometries, the proposed technique respects both simultaneously during manifold alignment. The global topologies are maintained by recovering underlying mapping functions in the joint manifold spac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-19833-5_33